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Abstract 

This paper develops a method for estimating parameters of a vector au- 
toregression (VAR) observed in white noise. The estimation method assumes 
the noise variance matrix is known and does not require any iterative pro- 
cess. This study provides consistent estimators and shows the asymptotic 
distribution of the parameters required for conducting tests of Granger causal- 
ity. Methods in the existing statistical literature cannot be used for testing 
Granger causality, since under the null hypothesis the model becomes uniden- 
tifiable. Measurement error effects on parameter estimates were evaluated 
by using computational simulations. The results show that the proposed ap- 
proach produces empirical false positive rates close to the adopted nominal 
level (even for small samples) and has a good performance around the null 
hypothesis. The applicability and usefulness of the proposed approach are 
illustrated using a functional magnetic resonance imaging dataset. 

Key Words: Asymptotic property, error s-in-variables model, Granger causal- 
ity, multivariate analysis. 



1 INTRODUCTION 

Multivariate time series modeling is an important component for the quantitative 
assessment of relationships between variables in many applied areas. This issue is 
essential in financial applications, for example, enabling optimal portfolio allocation, 
setting trading strategies over sectors of the market, or exchanging rates (Sims, 1980; 
Ni and Sun, 2003). In addition, the vector autoregressive model (VAR) is widely used 
in many fields such as economics (Granger, 1969), geophysics (Liu and Rodriguez , 
2005), bioinformatics (Fujita et al., 2007a) and neuroscience (Goebel et al., 2003). 
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The main reasons for the attractiveness of the VAR model in applied areas are 
its simplicity and relation with the concept of Granger causality (Granger, 1969). 
Granger causality has become a prominent concept in connectivity networks model- 
ing, because it provides inferences about the direction of information flow between 
different time series. Several studies in biological systems emphasize the importance 
of identification and description of gene regulator networks (Gottesman, 1984; Ka- 
toh , 2007), mainly in the study of tumors or structural diseases. Mukhopadhyay 
and Chatterjee (2007); Fujita et al. (2007a,b) introduced the utilization of VAR- 
based models to study these issues by applying these models to gene expression 
datasets. In Neuroscience, the functional integration theories highlight that brain 
functions heavily depend on neural connectivity networks (Cohen and Tong, 2001). 
Several neuroimaging studies (Goebel et al., 2003; Sato et al., 2006; Abler et al., 
2006) suggested that VAR models and Granger causality are suitable to identify 
the information flow between neural structures. Nevertheless, it is well known that 
most biological measurements are subject to error, since the precision of acquisition 
equipments is never absolute. Actually, this limitation is present in most studies 
involving experimental data, such as chemistry, physics, biometrics, etc. 

Although technically incorrect, the most common procedure is simply to ignore 
the measurement errors, i.e.: assume that the variables of interest are the observed 
variables. It is important to highlight that this assumption has serious implications. 
The utilization of conventional VAR model in this case would not identify correctly 
the relationships between the variables of interest (latent variables). It happens 
because the model white noise will not be independent which leads to misestima- 
tions of the model parameters. The usual assumption is acceptable when the errors 
are negligible. However, it is known that due to acquisition processes limitations, 
the measurement errors in biology (e.g.: gene expressions or brain signals) are not 
negligible. In these cases, the utilization of conventional VAR models may result 
in biased parameter estimation and as a consequence, unreliable Granger causality 
detection. 

In the following, we define the usual VAR model (for a more detailed description, 
see for instance, Lutkepohl, 2005). Let z t = (z u , ■ ■ ■ , z pt ) T denotes a(pxl) vector 
of time series variables. The usual VAR(r) model has the form 

z t = a + BxZt-i + . . . + B r z t „ r + q t , t = 1, • • • , n (1) 

where n is the sample size, Bj for j = 1, . . . , r are (p x p) coefficient matrices and 
q t is a (p x 1) unobservable zero mean white noise vector process with covariance 
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matrix X. For convenience, we consider that Z\ = for all / < 0. We are assuming 
throughout this paper that model (1) satisfies the stability condition defined in 
Liitkepohl (2005) on page 12. Therefore, under stationarity conditions, the mean 
and the autocovariance function are given, respectively, by 

E(z t ) = Hz = ( I P - B i ) a ' 

r 

7 (/i) = E[(z t - n z )(z t _ h - n z ) T ] = B M h -3), for h = 1, 2, 3, . . . 

and 

r 

7 (0) = ^S, 7 (-j) + S 

i=i 

where I p denotes the p x p identity matrix and 'y(-j) = j(j) T ■ 
Model (1) can be written in short as 

zt = a + Bz* t _ x + q t , t = 1, • • • , n (2) 

where B = {Bi B 2 . . . B r ) is a p x pr matrix and z*_ x = (zj_ 1: zj_ 2 , . . . , zj_ r ) r . 

Therefore, if the white noise has normal distribution, the conditional Maximum 
Likelihood (ML) estimators of a, B and S are equal to the ordinary least squares 
estimators. They are given, respectively, by 

n 

Kl =*t- B ML z* t -i, B ML = (S-^S^^y and S ML = n" 1 J] q t qj (3) 

i=i 

where z* t _ x = n~ l YT i= i z *-n *t = n' 1 YT i= i z u % = *i ~ a ML - B ML z*_ 1: S zU = 
n- 1 Er=i«i - *ViXi T and S z *_^ = n" 1 £ILi«i " *Vi)«i T - 

The consistence of those conditional ML estimators is assured under the station- 
ary conditions (see Liitkepohl, 2005, for further details). The consistence is shown 
using the fact that 



z t Hz, z* t -i Hz* = lr®M z , S z * t _ x r r (0) and S z *_ iZt T r (0)B 

where " — >" denotes convergence in probability when the sample size increases, <g> 
denotes the Kronecker product, l r is a r— dimensional column vector of ones, and 
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y(h + r 
7(/i + r 



1) 

2) 



/ y(/i — r + 1) 



j(h-r + 2) 



As described previously, VAR modeling is commonly applied for detecting Granger 
causality relationships. The basic idea of Granger causality is the evaluation of 
temporal information founded on the assumption that the cause always precedes 
its effect (Granger, 1969). Let Xt and yt be two time series. From the statistical 
perspective, x t is said to Granger-cause y t if the prediction error of y t , conditioning 
on the past values of both series, is less than considering solely the past values of y t . 
In other words, the past values of x t contains relevant information to improve the 
predictions of y t . Note that Granger causality concept is not equivalent to classical 
aristotelian causality, since the former is based solely on prediction errors. However, 
due to its simplicity, it may be applied to identify possible effective causalities. 

One possible approach of using VAR models for Granger causality detection is 
by performing statistical tests on B/s coefficients. Considering y t equation, if there 
is at least one coefficient multiplying the past values of x t which is not equal to zero, 
then x t is said to Granger-cause y t . Thus, this procedure involves the estimation of 
Bj, their respective covariance matrices, and the application of hypothesis testing. 

In general, many physical, biological and chemical variables have the measure- 
ment process subject to random effects and it is very common analyze them by 
using models assuming that these measurement errors are negligible. It may bring 
up undesirable features as biased estimates as well as their standard errors and, as a 
consequence, dangerously false confidence intervals and hypotheses testing will often 
be obtained using such approach. Thus, it is necessary to consider the measurement 
error on the modeling of these type of time series. 

In this paper, we study a VAR model with main concern on including measure- 
ment errors. Let z t be the true variable that is not directly observed, instead a 
substitute variable Z t is observed which has an additive structure given by 



where Z t = (Z u , Z 2 t, ■ • • , Z pt ) T is the observed vector and e t = (ei t , e^t, • • • , e P t) T 
is the measurement error vector with mean zero and variance-covariance matrix S e . 



Z t = z t + e t , t = 1, • • • ,n 



(4) 
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In most cases, if the usual conditional ML estimator is adopted for the observations 
subject to errors, i.e., replacing z t with Z t in the equation (1), the estimator of B will 
be biased (as can be seen in (6)). Therefore, in order to overcome this limitation the 
measurement errors should be included in the estimation procedure. Nevertheless, 
the model (1) with the equation (4) is not identifiable, since the covariance matrices 
of q t and e t are confounded when B = 0. It is easy to see that in the univariate 
AR(1), note that when r — p — 1 and b = we have: Zj = a + qi + ei with E(Zj) = a, 
7(0) = a 2 + a\ and ^y(h) = for all h 7^ 0. It is impossible to estimate a 2 and a 2 
separately by observing only Zi,...,Z n . This problem can be avoided by using 
previous knowledge about the variance of e t . 

This paper is organized as follows. Section 2 proposes consistent estimators for 
the VAR model with measurement errors and also presents the asymptotic distri- 
bution of the estimator of the elements of B. In Section 3, simulation studies are 
undertaken to investigate some aspects of the proposed estimators (rejection rates 
for a test of hypothesis, biases and mean square errors) also it is verified the impact 
by erroneously considering the usual model. We applied the models in a functional 
magnetic resonance imaging dataset in Section 4 and we finish the paper with con- 
clusions and remarks in Section 5. 

2 VAR WITH MEASUREMENT ERRORS 

In the presence of measurement errors, the conventional ML estimation of VAR 
models produces biased estimators and they can lead to wrong statistical inference 
(see Fuller, 1987, in which it is found a discussion over errors-in-variables in regres- 
sion models). There are some studies about measurement errors in times series (e.g., 
Geweke, 1977; Aigner et al., 1984). Those studies use Kalman filtering methodology 
and an Expectation and Maximization algorithm that requires intensive iterative 
procedures. Maravall and Aigner (1977) have provided a careful expose of the iden- 
tifiability of some time series models with errors in variables. Beck (1990) describes 
approaches based on state space modeling and Kalman filtering and demonstrates 
the usefulness of these tools in dynamic models. Kellstedt et al. (1996) show the 
efficiency gains adopting an errors-in-variables model, and the precision of Kalman 
filter estimates in the face of autocorrelation. These measurement techniques have 
been applied to a variety of substantive problems, including dynamic representation, 
social problems (such as racial inequality), monetary policy and public entrepreneur- 
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ship (Citar). 

These state space models can be attractive alternatives to conventional VAR 
modeling. However, in practice, the implementation of the estimators are not de- 
scribed in analytical form, but by interactive algorithms or numerical optimization 
solutions. In addition, the derivation of estimators convergence, standard errors, 
consistence and asymptotic distribution may be complex in these cases. In Shumway 
and Stoffer (2000), the section on state space methods shows an alternative proce- 
dure for how to estimate B, S and S e under model (1) with the error equations (4), 
using the EM algorithm. Hannan et al. (2003) proposed another iterative procedure 
to estimate these parameters. Nevertheless, as the main goal of this paper is to 
test Granger causality, these approaches can not be used, since the model becomes 
unidentifiable under the hypothesis B = 0. 

In this study, we provide simple and closed forms for the estimators when S e is 
known, which allows the direct derivation of their respective asymptotic properties. 
Since the main concern of several practical applications is Granger causality testing, 
this information is essential to data analysis. In this section, the main concern is the 
parameter estimation and its asymptotic properties. Theorem 1 states consistent 
estimators for the model parameters and Theorem 2 establishes the asymptotic 
distribution for the estimator of vec(B T ) given in Theorem 1, where vec(C) is an 
operator that heaps the columns of the matrix C. 

The methodology presented in this section is based on correcting the asymptotic 
bias of conventional ML estimator caused by the measurement error effect. The 
outcome is a consistent estimator with good asymptotic properties such as normality. 
The estimators and the asymptotic covariance matrix for the proposed estimator 
of vec(£? T ) are computed easily and no iterative procedure is required. We must 
remark that those estimators are not the conditional ML estimators nor the ML 
estimators taking into account the measurement errors which are very complicated 
to reach by maximizing the likelihood, even under normality of the errors. 

Theorem 1. If e t ~ A/"(0, S e ) with S e known. Then, the parameters of the model 
(1) under measurement errors as in (4) have consistent estimators given by 






and 
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S = n' 1 J2( Zi ~ S ~ - 2 - # z *-i) T - S e - B(/ r <g> S e )S T 

i=l 

where ZVi = n" 1 E, ^-1, ^ = n" 1 E. ^ ^ = n" 1 E;(^*-i - Z m t-i)ZU T 
and S ZUZt = n- 1 " ^Vi)^ T . 

The proof of Theorem 1 can be found in Appendix A.l. Notice that, if S e = pxp , 
that is, when there is no measurement error, then the estimators of Theorem 1 
become the conditional ML estimators presented in (3). Also, it can be seen that 
the conditional ML estimator of B from the model (1), without considering the 
errors (4), is given by 

Bml = 

which is not consistent, since 

Bml B[I pr + (I r <g) E e )rV(0) -1 ] -1 . (6) 

The main steps to demonstrate (6) is given in Appendix A.l, in which is sufficient 
to compute the limit of Sz«_ and Sz*_ x z t - The quantity Sz* has two sources 
of variations, one that refers to the unobservable variable z*_ x and another one 
that refers to the measurement error. If the measurement error is huge and the 
sample size is not large enough, the quantity {Sz*_ Y — I r ® E e ) may not be positive 
definite and the estimator B, presented in (5), will be inadmissible. If the quantity 
(S z * _ x — J r <8>5] e ) has at least one eigen value close to zero the estimator B, presented 
in 5, will be unstable (because the computation of a matrix inverse requires all eigen 
values to be different from zero). If the matrix S e is well specified, one way to avoid 
such inadmissibility and instability is increasing the sample size. 

In many practical applications, there is some interest on testing some elements 
of the matrix B (e.g., the so called Granger causality test). However, the exact 
distribution of vec(B T ) is hard to compute. Thus, one can use its asymptotic distri- 
bution to build confidence regions and hypothesis testing as an approximation when 
the sample size is finite. The Theorem below gives us the asymptotic distribution 
of vec(B T ). 

Theorem 2. If e t ~ A/"(0, S e ) with S e known and E(q i j 1 q i j 2 q i j 3 q i j 4 ) < oo for all 
Ji,j2,j3,j4 £ {!>•••?£>}; where qij is the j th element of qi. Then, the asymptotic 
distribution of vec(B T ) obtained in Theorem 1 is given by 



S z ^Szi_ x z t 
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^(vec(B T ) - vec(B T )) JV(0, <£), (7) 
where the p 2 r x p 2 r matrix <fr given by 

$ = e„ ® rv(o)- 1 + (i p <g> r^oy^Ariip <g> iv(o)- 1 ) 

A r = <g> (I r <g> S c ) + B T <g> [£ e B(J r <g> S c )] + 

- ^ |(B ft s e ) ® r r (/i) + (s e B^) ® r r (-/i) I + 

r-1 

+ ^ [B(J-h <S> S e )i? T ] ® r r (/i). 

h=l-r 

and S$ = S + S e + B( J r <g> Ti e )B T , where Ji is a (r x r) matrix of zeros with one 's 
in the \l\ th diagonal above (below) the main diagonal if I > (I < 0) and J is a 
(r x r) matrix of zeros. 

The proof of Theorem 2 can be seen in Appendix A. 2. For all r and S e = 
we have <fr = X <g) r r (0) -1 , as given in Liitkepohl (2005). The normal distribution 
assumption for the measurement error is required to compute the expectation of 
polynomial functions (until forth degrees) of the elements of e t . Notice that, if 
r = 1 we have the VAR(l) model and the asymptotic covariance simplifies to 

$ = Etf® 7(D)- 1 + (I p <g> 7 (0)- 1 )A 1 (J ?) <g> 7(0)-^ 

where 

A 1 = <g> S e + B T <g> (E e BS e ) - [(BE e ) <g> (j(0)B T ) + (£ e B T ) <g> (B 7 (0))]. 

The i th element of vec(-B T ), is asymptotically normally distributed with standard 
error given by the square root of i th diagonal element of Thus, we can obtain 
hypotheses tests on the individual coefficients, or more general form of contrasts 

H : Cvec(B T ) = d Versus H x : Cvec(B T ) ^ d, 

which involves coefficients across different equations of the VAR model. Thus, 
Granger causality testing can be carried out by adequately specifying this con- 
trasts matrix. An illustrative example is the case of series x t and y t , in which we 
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are interested in evaluating the Granger causality from x t to y t in an r-order VAR 
model. The matrix C has r rows, one for each coefficient related to the past values 
of Xt in the yt equation. Considering that each column of C refers to each VAR 
coefficient, the contrast matrix is specified by simply setting 1 to the cell at the 
respective column and row for the x t coefficients in y t equation. This may be tested 
using the Wald-type statistic conveniently expressed as 

n{Cvec{B T ) - d) T [C$C T ] ~\Cvec{B T ) - d) (8) 

Under the null hypotheses, (8) has a x 2 ( m ) distribution in the limit, where 
m = rank(C) gives the number of linear restrictions. 

The above study can also be developed to the intercept model estimator, it can be 
found by applying the delta method (Lehmann and Casella, 1998) in the asymptotic 
distribution of (Zj , Z*J ± , vec(B T ) T ), since a = Z t - {I ® Z*J l )vec{B T ). Although, 
this asymptotic distribution is important to test hypotheses regarding the model 
intercept, it is outside the main scope of this article and does not have any impact 
on the Granger causality, for this reason we skip it. 



3 SIMULATION RESULTS 

In this section we conduct some simulation studies in order to evaluate the adequacy 
of the asymptotic distribution of vec(B T ) for small and moderate samples sizes. 
Computations were performed using the software R (www.r-project.org). 

For each setup of parameters and sample sizes, we considered 15,000 Monte Carlo 
samples generated from a VAR(l) model with measurement errors, given by 



Z2,t 

Zi,t 
Z 2 ,t 



a i 

a-2 

Zl,t 



+ 



hi &i2 ( zi,t-i\ f Qu 

hi h2 \ Z 2,t-lJ \Q2t 

K e 2t/ 

In all samples, we have considered the following setup of parameters: a\ = a 2 = 1, 
hi = b 22 = 0.5, 



(9) 
(10) 



S = 



10 5 

5 5 
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where the vector parameters values of (&i 2 , b 2 i) were the values of the set {(612, &21); &12 £ 
5 and fe 2 i £ 5*}, where 5 = {—0.4, —0.2,0.0,0.2,0.4}, the variance of the measure- 
ment error e t was S e = 2J 2 , and the size samples n = 50, 100, 250, 500. 

The rejection rates of the hypothesis H Q : b\ 2 = 621 = (i.e., z 2>t -i does not 
help to explain ;z ljt and Zi,t-i does not help to explain z 2 ^) are shown in Table 1, in 
which the test sizes are the rejection rates under the null hypothesis (that appears 
in bold). The Wald-type statistics (8) is used at 5% nominal level. From this table 
we have that, the test sizes from the proposed model are closer to the nominal level 
(5%) as compared to the usual approach for all sample sizes. Furthermore, when n 
increases the test sizes for the usual model also increase and, consequently, they do 
not converge to the adopted nominal level. This is a somewhat expected behavior 
because the usual approach produces biased estimates and standard errors. Table 1 
depicts the power of the test in each methodology, which shows a good performance 
of the proposed approach. Nevertheless, it is not possible to compare the power 
between the two methods because they have different empirical test sizes. 

[[ Table 1]] 

We observe that, the results shown in Table 1 are similar for other values of 
the parameters a and B, if we maintain the same proportionality of £ and S e as 
defined above. But, our simulations suggest that the larger the measurement error, 
the larger the sample size required to have a good asymptotic approximation for the 
Wald-type statistics (8). 

We also conduct simulation studies for testing the simple hypothesis H : 6 12 = 
at 5% nominal level. In this study, we keep fixed the value of 621 = 0.2. Others 
simulations were built considering others values for b 2 i, however, the results are close 
to each other and, for this reason, we omit them. As can be seen, Tables 1 and 2 
present similar behaviors, i.e., the proposed model has always empirical size test 
closer to the nominal level than the usual model. 

[[ Table 2]] 

In Table 1 and 2, the usual approach seems to be most powerful than the proposed 
approach when b 2 \ = 0.2 and b 21 = 0.4. However, as aforementioned, they can not 
be compared directly, just because the real nominal level used to compute that 
powers are not the same. Thus, we used a descriptive measure in order to analyze 
both methodologies around the null hypothesis. Let a n (a) be the probability of 
the error type I using the true distribution of (8) when the sample size is n and a 
is the adopting nominal level based on its asymptotic distribution. For instance, 
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in Table 2 we have estimated aToo(0.05) = 0.0513 for the proposed approach and 
5ioo(0.05) = 0.0837 for the usual approach. An expected behavior for good statistics 
is a n (a) a which means that the quantiles of the true distribution of (8) will be 
close to the quantiles of the asymptotic distribution, x 2 (m), when the sample size 
is sufficiently large. Thus, the relation a n (a)/a tell us how far is the a-quantil of 
the asymptotic distribution from the true distribution of (8) for each n. Therefore, 
we can define a sort of corrected power as 

p{c)( v = PnK(a)) 
n 1 ) (a n (a)/a) 

where P n (a(a)) is the power using the true probability of the error type I, namely 
a n (a). We are just penalizing the power by the distance between a n (a) and a. 
Notice that, the power under the null hypothesis has to be the nominal level and for 
comparing powers from different statistics it must be done using the same nominal 
level. Let a-y n {a) and a 2n (a:) be the true probability of the error type I for two 
different statistics when the sample size is n. Then, under the null hypothesis, we 
have 

pg(<*) = p£\ a ) = a , 

and hence, the corrected powers P[^ and P^ are comparable. Moreover, under an 
alternative hypothesis and when n increases, an expected behavior of Pn\a) is to 
converge to one. Although, this corrected power is not a monotonic function of the 
sample size nor of the nominal level, we believe that it is a kind of descriptive measure 
to evidence how unsuitable is the usual model when compared with the proposed 
one outside the null hypothesis. Furthermore, the proposed corrected power varies 
between and infinity. Figure 1 shows the corrected power for both approaches, the 
null hypothesis was H Q : 6 12 = 0. The full line refers to the proposed approach and 
the dashed line refers to the usual one. The panels (a.l), (b.l), (c.l) and (d.l) refer 
to the corrected power when the alternative hypothesis are b\2 = —0.4, b\2 = —0.2, 
bi2 = 0.2 and b\2 = 0.4, respectively at a — 0.01. The panels (a.2), (b.2), (c.2) and 
(d.2) refer to the corrected power when the alternative hypothesis are b\2 = —0.4, 
bu = —0.2, bu = 0.2 and b 12 = 0.4, respectively at a — 0.05. The panels (a. 3), 
(b.3), (c.3) and (d.3) refer to the corrected power when the alternative hypothesis are 
bu = —0.4, b 12 = —0.2, bi2 = 0.2 and byi = 0.4, respectively at a = 0.10. We observe 
in all graphs that, the usual approach has the worst performance (going to zero 
when the sample size increases) while the proposed one have an expected behavior 
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for a good statistic (going to one when the sample size increases). In general, the 
corrected power under the usual methodology goes to zero because the distance 
between a n (a) and a increases much faster than the uncorrected power, P n (a n (a)), 
when n increases. This behavior is still true for another setup of parameters. 

[[ Figure 1]] 

[[ Table 3]] 

Table 3 shows that, for this set of parameters, the biases of the estimators of fry 
= 1, 2) from the proposed model is smaller than the value supplied by the usual 
model. Moreover, the larger the sample size, the smaller the bias and MSE under 
the proposed model (this does not happen for the usual approach). 

4 APPLICATION 

As previously described, the models with measurement errors have great relevance 
in applied sciences, since equipment imprecisions are inherent to data acquisition. 
Actually, the usual models are commonly applied ignoring these errors. Nowadays, 
the scientific community started to pay enough attention to the fact that these pro- 
cedures may lead to spurious results. In this section, we illustrate the concepts intro- 
duced in the present study with an application embedded in Neuroscience research, 
with the utilization of VAR modeling for the characterization of brain networks. 

The dataset explored in this application is proceeding from a functional mag- 
netic resonance imaging (fMRI) experiment. Basically, fMRI acquisition is based on 
monitoring the BOLD signal (blood oxygenation level dependent) at several brain 
regions through time. One of the main advantages of fMRI over other imaging tech- 
niques is its non-invasive protocol and relative high spatial resolution. The BOLD 
signal is related to oxygen consumption and blood flow, being considered as an in- 
direct measure of local neural activity (Logothetis et al. (2001)). Regarding this 
property, this signal is used to quantify and locate the brain activity in humans. 

In this study, the BOLD signals at four brain regions from a subject in a resting 
state (eyes closed) condition were considered. The data was collected in a Siemens 
3Tesla MR system (TR=1800ms, TA=900ms, TE=30ms). The selected brain re- 
gions were: left primary motor cortex (left Ml), right primary motor cortex (right 
Ml), supplementary motor area (SMA) and right cerebellum. The anatomical loca- 
tion of this areas is shown in Figure 2. These areas are frequently involved in active 
and planned right hand fingertapping. We aim to evaluate the information flow 
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between these areas in a resting state condition by using VAR models for Granger 
causality identification. 

A well described limitation inherent to all fMRl acquisition is the high level of 
scanner noise. Thus, the signals observed mirror not only the physiological variations 
but also includes measurement errors. For this specific dataset, it was estimated that 
the error composed approximately 34.60% of the observed time series standard de- 
viation. For simplicity, each observed series were normalized to have mean zero and 
variance one. Thus, the measurement error was considered to be serially uncorre- 
lated, independent of the latent variables and with a standard deviation of 0.346. 

The model considered for the latent variable is given by 

z t = a + Bi«t_i + q t , t = l,---,n (11) 

where n = 200 is the time series length, z t = (z u , z 2 t, z 3 t, ZuV with z\ t '■ the Left 
Ml BOLD signal, z 2t : the SMA BOLD signal, z 3t : the Right Ml BOLD signal and 
z 4t : the Right cerebellum BOLD signal; Bi is the (4 x 4) autoregressive coefficients 
matrix 



fbn 


bu 


bi3 


bu\ 


621 


b 22 


b 23 


b 2 4 


&31 


&32 


b 33 


b 3 4 


\&41 


&42 


h 3 


644/ 



(12) 



and q< is an (4 x 1) unobservable zero mean white noise vector. The observed 
variables are given by 

Z t = z t + e t , t = 1, • • • ,n (13) 

where Z t = (Z lt , Z 2t , Z 3t , Z it ) T and e t = (e U) e 2t) e 3t , , e it ) T is the measurement error 
vector. 

The time series plots corresponding to the respective observed BOLD signal at 
each brain region are represented in Figure 3. Since we are interested in identifying 
the links of connectivity networks using Granger causality, the statistical inferences 
are related to the parameters bij = 1,2,3,4). If bij 7^ 0, then there is a in- 
formation flow from brain area j to area % (Baccala and Sameshima (2001)). The 
coefficient estimates, standard errors and p-values (H : b^ — vs Hi : b^ 7^ 0) for 
both usual and proposed approaches are shown in Tables 4 and 5, respectively. 
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[[ Figure 2]] 
[[ Figure 3]] 
[[ Figure 4]] 
[[ Figure 5]] 
[[ Table 4]] 
[[ Table 5]] 

The results described in Tables 4 and 5 suggest the existence of bidirectional 
information flow between Left Ml and Cerebellum. However, the application of 
usual approach indicates also that Left Ml sends information to SMA and Right 
Ml, and that the latter sends to SMA. For both usual and proposed approaches, 
the diagrams of the networks at the significance level of 5% are shown in Figure 4. 
As highlighted by the simulations results, the utilization of usual VAR estimation, 
ignoring the measurement errors, may result in wrong test nominal sizes. In this 
context, it is important to mention that the main differences between the usual 
and proposal results were on standard deviation estimates. Further, the proposal 
estimates are almost twice the values resulting from usual approach. The theory and 
simulations suggest the existence of biases in the latter. Consequently, the p-values 
from the usual method tend to be underestimated, resulting in high rejection rates. 
Note that this connections may possibly exist, but since the nominal level of the test 
is "incorrect", the type I Error is not under control. In addition, note that some 
coefficients were considerably underestimated, for example bu, 622 and 633. Finally, 
the qq-plots represented in Figure 5 suggest that the probability density of residuals 
Zt — Z t are reasonably approximated by the Normal distribution. 

Some studies (Biswal et al (1995)) suggest the existence of functional networks 
between motor areas even in resting state condition. These studies are based on 
correlation analysis between the BOLD signal at different brain sites. First, it is 
important to note that Granger causality is conceptually different from correlation, 
which is symmetric (it does not provide the direction of information flow ), evaluated 
in a pairwise fashion (and not in the full multivariate sense) and it does not take 
into account temporal information. In fact, correlation analysis is more closely 
related to instantaneous Granger Causality concept, which can be useful to quantify 
simultaneity between time series but it is unsuitable in the context of information 
flow detection. Second, the usual correlation analysis does not consider the presence 
of measurement errors, which may also affect the statistical significance of results. 
The nature of functional networks in resting state is still unclear and is the subject 
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of several studies (Long et al. (2008)). Nevertheless, we have demonstrated in this 
study that the inclusion of measurement errors can considerably influence the final 
results. Thus, the development of novel approaches dealing with this artifact is 
necessary. 

In summary, since the proposal and usual results differ, we conclude that the 
presence of measurement error cannot be ignored. An adequate treatment for this 
artifact is essential for the adequate description and modeling of brain networks. It is 
surprising that this important limitation received proper attention only recently. We 
believe that a preliminary analysis of this problem points toward the demand for the 
development of new estimation procedures regarding scanner noise characterization, 
physiological noise and computational implementation. 

5 CONCLUSION 

This paper has introduced a new approach to model multivariate times series when 
measurement errors are present. The simulation studies indicate that the proposed 
approach gives coherent results (test size close to the nominal level even for small 
samples, power increasing with the sample size under alternative hypotheses, biases 
and mean square errors decreasing when the sample size increases) under small and 
moderate measurement error. Such features seem no to be shared by the conven- 
tional maximum likelihood estimators which presents a much poorer performance. 
Furthermore, the proposal is easily attained and iterative procedures are not re- 
quired. The theory, simulations and application showed that the presence of mea- 
surement error cannot be neglected and a proper model has to be considered for 
the adequate description and modeling of brain networks. We expect to report 
generalizations of the proposed model (for elliptical errors and heteroscedasticity 
situations), a residual study and more simulation studies for large measurement 
errors on incoming papers. 

A PROOF OF THEOREMS 

A.l Proof of Theorem 1 

In order to prove the consistence of the estimators stated in Theorem 1, namely 
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a 



t-i, 



B 



\ Sz* . — Ir 



4 5; 



and 



^ = n ~ l J2( Z * - S " ^ Z *-i)(^ -a- BZUY - S e - B(J r ® E e )£ T , 

i=l 

we must study the limits of the quantities Sz*_ v Sz*_^z t i %*t-\ and Z* t when 
the sample size goes to infinity. Note that Z*_ x = z*_ x + e*_ 1: where e*_ x = 
(ej_ 1: . . . , eJ_ r ) T , and under the stationary conditions of a VAR(r) model we have 
that 



s *r-i = n _1 E( z r-i-^*-i) z S 

i=l 
n 

= ^ E«i + e *-i - z *'-i - e Vi)«! + e^jT 
1=1 

= r r (o) + / r ®s e + o p (n- 1/2 ), 

where <^ e *_ l = n _1 X)"=i e i-i e i-ii an d O p (n -1 / 2 ) means limited in probability even 
multiplying by n 1 / 2 (it happens with the crossing product in the above expression). 
That is, Sz* t _ x — > T r (0) + I r ® S e . Following the same scheme, we have that 

n 

Sz Ll z t = n-^^-ZVi)^ 

i=i 
?i 

= ri' 1 J2( z *-i + <-i " z Vi - e"Vi)(2i + e t ) T 
i=i 

= r r (0)B T H-O^n- 1 / 2 ), 
and finally, both the quantities Z* t -i and Z* t converge in probability to /x*. Hence, 

(S zu -I r ® Se)' 1 ^.(O)- 1 and r r (0)£ T , 
thus, the probability convergence of a, and S to a, B and £ follow, respectively. 
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A. 2 Proof of Theorem 2 

The proof idea has three steps. The first step consists in show that vec(B T ) — 
vec(.B T ) can be written as linear combinations of a vectorial mean. The second one, 
we must demonstrate that this vectorial mean has an asymptotic normal distribu- 
tion. The last step must conclude that vec(i? T ) — vec(i? T ) also has an asymptotic 
normal distribution. In order to prove Theorem 2, we need some auxiliary results, 
which are exposed in two propositions below. 

Proposition 1. Under the model (1) and (4), the proposed estimator B has the 
following relationship 

vec(B T ) - vec{B T ) = (I p <g> T^O)™ 1 )^ + O^ 1 ), 

where 



n 

W = n- 1 J2 



n 

,-J " 

i=l 



= n ^VW,; 



V w qi J 

withWi = (q i +e i -Be?_ 1 )®(z?_ 1 -ti, m +e?_ 1 )- 1 9? andV = [I p ®(I r ®'Z e )}vec(B T ). 

Proof: Define B k as a vector (rp x 1) of coefficients associated with the k th element 
of the vector z t) that is 

Zkt = Ofc + R.k z t-\ + Qkt- 

Thus, we have that vec(-B T ) = (B~[, B 2 -> • • • , Bp) T and the estimator of Theorem 
1 for it can be written as vec(B) = {B\,B T 2 , - ■ ■ , B^) T , where B k = (Sz*^ — 
I ® H e )^S zt _ iZkt and S ZLiZkt = n-'Zl^ZU - Z* t _ x )Z kt for k = 1, . . . ,p. 
Moreover, the model (2) may be rewritten in terms of the observed variables as 

Z t = a + BZ?_ 1+ # t , 
& t = q t + e t - Be*_ 1: 

and for the k th element of Z t we have 

Zkt = «fc + B\Z*__ X + d kt) 
#kt = Qkt + e kt - B\e*_ v 

Then, it follows that 

n 

Sz Ll z k = n~ l Y,( Z t-i ~ Z*t-i)(a k + B\Z*_ X + **) = S z * t _B. k + S z ^ k , 
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where S z ^ k = n' 1 ELi^-i " Z m t-i)#ki = ^EIU^-i ~ M* + + 
O p (n~ 1 ). Hence, denoting S z *_^ k = n' 1 Er=i( z i°-i ~ A** + e *-i)$fci we have that 

Sz*_,z k = (S ZLi - I r <g> S e )£. fc + 5^_ ltft - + O^n" 1 ), 
with = —I r ® S e I? fe . As a result, we have 

where W" fe = E£=i W fe i and W ki = {z*_ x - /x* + e*^)^ - Hence, it follows 
that 



vec^ 1 ) - vec(S l 



{Ip^TriOy^W + Opin- 1 ), 



where 



( W u \ 



\ W qi J 



i=l 



with Wi = (qfi+e i -Be?_ 1 )(8)(z;_ 1 -/i*+e?_ 1 )-* and * = [I p ® (I r ®S e )]vec(£ T ). 

Proposition 2. // e t ~ A/"(0, S e ) wjffi S e known and E(q i j l qij 2 qij 3 q i j 4 ) < oo for 
all ji, 32-ijz-iH e {!;•••?£>}; where qij is the j th element of qi. The mean, W, of 
Proposition 1 has an asymptotic distribution given by 



\/nW jV(0, T r ) , 



where 



T r = <g> r r (0) + <g> (I r <g> S e ) + B T <g> [Z e B(I r <g> S e )] + 

- £ |(B h s e ) ® r r (/i) + (s e Bj) ® r r (-/i) I + 

r-1 

+ [B(J- h ®x e )B T ]®r r {h). 

h=l-r 

where J\ is a (r x r) matrix of zeros with one's in the \l\ th diagonal above (below) 
the main diagonal if I > (I < 0) and J is a (r x r) matrix of zeros. 

Proof: Notice that the expectation of Wj is equal to zero for all i. Shumway 
and Stoffer (2000) state a central limit theorem to a univariate M-dependent se- 
quence of random variables with mean zero. We say that a time series x t is 
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M-dependent if the set of values x s ,s < t is independent of the set of values 
x s , s > t + M + 1 (Shumway and Stoffer, 2000, on pg. 66). Then, assuming that 
^(QijiQmQijsQm) < 00 for a11 3uh,h,h e I 1 , • • •,£>} where ^ is the j th element 
of qi and defining x = n" 1 Yli=i x i-> where Xi = S T Wi we have that E(xj) = 0, 
Cov^,^) = <TCov(W t , = ^^(WiWilJ* and 

^(WiWilJ = E[F ih ®(z*_ 1 - t x*)(z*_ h _ 1 - t x*) T } + E[F ih ®eUe*I h - 1 } + 
+ E[F lh ® cr_i« fc _i - aO T ] + ® (*;_! - M^etVj - 

with F ih = (qt + e { - Be*_ 1 )(q i - h + e;_ A - Be*_ h _ 1 ) T . Thus, using some matricial 
results and simple expectation rules we can solve these expectations as follows 

E(WiWj_ h ) = for \h\< r, 

E{W l Wl h ) = -{B r ll e )®T r (h) for h = r, 

E(WiW7 h ) = -(E e B^)®r r (h) for h=-r, 

E{W i WJ_ h ) = [B(J_ h ®X e )B T ]®r r (h)-(B h X e )®r r (h) for h = l,.. . ,r-l, 

S(WiW;IJ = [B(J_ ?i ®S e ) J B T ]®r r (/ i )-(S e B | ]; | )®r r (/ i ) for = -1, . . . , 1-r, 

^(WiW^) = S^®r r (0) + S tf ®(J r ®S e ) + B T ® [S e B(/ r ®S e )] for /i = 0, 

where J; is a (r x r) matrix of zeros with one's in the \l\ th diagonal above (below) 
the main diagonal if I > (I < 0) and J is a (r x r) matrix of zeros. That is, 
strictly M-dependent sequence of random variables with mean zero 
(where M = r) and, therefore, we can use the result stated in Shumway and Stoffer 
(2000), which says that 

^x W(0, V r ) 

where 

r 

y r = Cov(6 T Wi, 6 T Wi- h ) = 6 T T r 6 

h=—r 

with 
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T r = £„ <g> r r (0) + £,9 <g> (J r <g> S e ) + B T <g> [£ e B(J r <g> S e )] + 
-J]|(B ft Ee)(8)r r (/i) + (S c Bj)(8)r r (-/i)j + 

r-1 

+ ^ [B(J_/ l ® S e )B T ] ® r r (h). 

h=l-r 

As y^ 1 ^"^ is asymptotically normally distributed for all 6 ^ r then, by the 
Cramer- Wold device (see Theorem 10.4.5 on page 336 in Athreya and Lahiri, 2006), 
we have that 

\fnW A/"(0, T r ). 
Then, by the Propositions 1 and 2, the prove of Theorem 2 follows 

v^(vec(B T ) - vec(B T )) jV(0, [I p <g> r^O) -1 ]^^ <g> r r (0) -1 ]). 
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Table 1: Rejection rates (%) of the hypothesis H : b X2 = b 21 = (at 5% nominal 
level) using the Wald statistics (8) for n = 50, n = 100, n = 250 and n = 500. 
The bold numbers at the center are test sizes (they are expected to be 5%) and the 
numbers around them are empirical powers. 
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Table 2: Rejection rates (%) of the hypothesis H : b 12 = (at 5% nominal level) 
using the Wald statistics (8) for n = 50, n = 100, n = 250 and n = 500. 















Model 


-0.4 


-0.2 


0.0 


0.2 


0.4 


n = 50 












Proposed Model 


43.80 


14.43 


5.27 


6.95 


13.55 


Usual Model 


36.79 


10.16 


6.94 


17.49 


33.24 


71= 100 












Proposed Model 


71.65 


21.91 


5.13 


10.79 


25.17 


Usual Model 


61.90 


12.39 


8.37 


32.27 


60.75 


n = 250 












Proposed Model 


97.58 


43.04 


5.02 


21.53 


55.45 


Usual Model 


94.43 


21.52 


13.71 


68.83 


95.32 


n = 500 












Proposed Model 


99.96 


70.57 


4.94 


39.51 


84.58 


Usual Model 


99.88 


37.49 


24.89 


94.32 
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Table 3: Empirical bias and mean squared error for the proposed and usual model. 
Note that, the biases 



Proposed model Usual model 

Bias MSE Bias MSE 

n = 50 

fen -0.0100 0.0459 -0.1446 0.0454 

&i2 -0.0647 0.0809 0.1098 0.0434 

621 0.0290 0.0269 0.0250 0.0143 

b 22 -0.0765 0.0472 -0.1589 0.0461 
n = 100 

611 -0.0035 0.0203 -0.1313 0.0290 

612 -0.0335 0.0328 0.1209 0.0293 
621 0.0127 0.0115 0.0165 0.0067 
fe 22 -0.0343 0.0183 -0.1265 0.0258 
n = 250 

611 -0.0022 0.0075 -0.1252 0.0203 

612 -0.0118 0.0112 0.1299 0.0224 

621 0.0040 0.0043 0.0112 0.0027 

622 -0.0128 0.0063 -0.1086 0.0156 
n = 500 

611 -0.0019 0.0037 -0.1235 0.0175 

612 -0.0053 0.0054 0.1326 0.0203 
621 0.0018 0.0021 0.0097 0.0013 
b 22 -0.0057 0.0030 -0.1024 0.0124 
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Table 4: Application to real data - usual approach: coefficient estimates, 
standard deviations and respective p- values (H : coefficient is equal to zero). 



Parameter 
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Standard Deviation 


p— value 
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<0.001 


hz 


0.145 


0.063 


0.002 


hi 


0.047 


0.062 


0.442 


hi 


0.165 


0.076 


0.030 


h2 


-0.074 


0.074 


0.319 


h3 


0.242 


0.071 


<0.001 


hi 


-0.061 


0.069 


0.378 


hi 


0.294 


0.070 


<0.001 


bn 


-0.060 


0.068 


0.381 


h3 


0.092 


0.065 


0.154 


hi 


0.350 


0.064 


<0.001 
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Table 5: Application to real data - proposed approach: coefficient estimates, 
standard deviations and respective p- values (H : coefficient is equal to zero). 



Parameter 


Estimate 


Standard Deviation 


p— value 


hi 


0.935 


0.137 


<0.001 


hi 


-0.032 


0.127 


0.803 


h?> 


-0.095 


0.103 


0.357 


hi 


-0.287 


0.091 


0.002 


hi 


0.132 


0.137 


0.332 


hi 


0.581 


0.126 


<0.001 


hz 


0.199 


0.103 


0.053 


hi 


0.027 


0.092 


0.765 


hi 


0.279 


0.156 


0.073 


h2 


-0.184 


0.143 


0.201 


h3 


0.346 


0.117 


0.004 


hi 


-0.111 


0.106 


0.294 


hi 


0.538 


0.147 


<0.001 


h2 


-0.252 


0.135 


0.063 




0.044 


0.110 


0.687 


6 44 


0.528 


0.099 


<0.001 
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Figure 1: Corrected power versus sample size. The full line refers to the proposed 
approach and the dot line refers to the usual one. It is expected that the corrected 
power converges to one. 



29 



Figure 2: Four areas were selected for connectivity evaluation using the VAR model: 
Left Ml: left primary motor cortex, Right Ml: right primary motor cortex, 
SMA: supplementary motor area and Right Cerebellum. 
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Figure 3: Observed signal at each brain region. 



Proposed Approach Usual Approach 




Figure 4: Identified network of information flow by testing the parameters of VAR 
model (a = 5%) 
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QQplot - Left M1 



QQplot - SMA 





Figure 5: QQplot for Normal distribution: Residuals (Observed values - Pre- 
dicted) at each brain region. 
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